Incorporating Visual Information into Sound Source Separation
نویسندگان
چکیده
We present a method of improving sound source separation using vision. The sound source separation is an essential function to accomplish auditory scene understanding by separating a stream of sounds generated from multiple sound sources. By separating a stream of sounds, recognition process, such as speech recognition, can simply work on a single stream, not mixed sound of several speakers. The performance is known to be improved by using binaural microphone and microphone array which provide spatial information for separation. However, these methods still have around 20 degree of positional ambiguities. In this paper, we further added visual information to provide more speci c and accurate position information. As a result, separation capability was drastically improved. We argue, from the experiments, in this paper, that integration of vision and auditory sensory inputs improves cognitive tasks such as auditory stream separation.
منابع مشابه
Bayesian Source Separation and Localization
The problem of mixed signals occurs in many different contexts; one of the most familiar being acoustics. The forward problem in acoustics consists of finding the sound pressure levels at various detectors resulting from sound signals emanating from the active acoustic sources. The inverse problem consists of using the sound recorded by the detectors to separate the signals and recover the orig...
متن کاملIncorporating Audio Signals into Constructing a Visual Saliency Map
The saliency map has been proposed to identify regions that draw human visual attention. Differences of features from the surroundings are hierarchially computed for an image or an image sequence in multiple resolutions and they are fused in a fully bottom-up manner to obtain a saliency map. A video usually contains sounds, and not only visual stimuli but also auditory stimuli attract human att...
متن کاملUsing Vision to Improve Sound Source Separation
We present a method of improving sound source separation using vision. The sound source separation is an essential function to accomplish auditory scene understanding by separating stream of sounds generated from multiple sound sources. By separating a stream of sounds, recognition process, such as speech recognition, can simply work on a single stream, not mixed sound of several speakers. The ...
متن کاملAudio Source Separation by Probabilistic Latent Component Analysis
The problem of audio source separation from a monophonic sound mixture having known instrument types but unknown timbres is presented. An improvement to the Probabilistic Latent Component Analysis (PLCA) source separation method is proposed. The technique uses a basis function dictionary to produce a first round PLCA source separation. The PLCA weights are then refined by incorporating note ons...
متن کاملAudiovisual source separation
Blind source separation (BSS) can be seen as a generalization of denoising a noisy signal when several sensors are available. Each of them records the same physical phenomenon in a different way: such a diversity is then useful to separate the present signals for instance by independent component analysis (ICA) or sparse component analysis (SCA) [1]. The main objective of speech separation/extr...
متن کامل